该 指令集架构(ISA) 是软件与硬件之间的基本契约。它定义了程序员可见的状态以及处理器执行的具体操作。 Y86-64 指令集架构 是 x86-64 的一个教育性子集,将复杂的 CISC 设计简化为更易管理的模型,同时保留了寄存器密集型的过程调用机制。
1. 程序员可见状态
该状态包括 寄存器文件(RF) 15 个寄存器, 条件码(CC) 用于流程控制,以及 程序计数器(PC)和一个 状态码(Stat) 用于指示正常操作(AOK)、暂停(HLT)或错误(ADR/INS)。
2. CISC 与 RISC 特性
虽然 x86-64 是典型的 CISC 架构,但 Y86-64 更倾向于 RISC,具有 固定长度编码 和严格的 加载/存储架构,其中内存只能通过特定的移动指令访问,例如 rmmovq rA, D(rB)。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Modify the sum function (Figure 4.6) to implement absSum using a conditional jump. Which approach is most architecturally sound for Y86-64?
Using jge to skip a subq instruction that negates the value.
Using a call to a separate absolute value function.
Using a memory-to-memory comparison.
Changing the status code to INS if the value is negative.
✅ Correct!
In Y86-64, we test the value (andq %r10, %r10) and then use jge to jump past the subtraction logic if the value is positive.❌ Incorrect
Y86-64 does not support memory-to-memory comparisons or hardware-level status changes for logic control.QUESTION 2
When implementing absSum with conditional move (cmovXX), how do we handle the sign inversion?
Subtract the value from zero in a temporary register, then cmovl the negative result back.
Use the iaddq instruction to flip the bits.
Y86-64 performs sign inversion automatically during mrmovq.
Conditional moves cannot be used for arithmetic logic.
✅ Correct!
We compute -x (e.g., using subq) and then use cmovl to replace x with its negative if the original was less than zero.❌ Incorrect
cmovXX only moves data; it does not perform arithmetic during the move itself.QUESTION 3
What is the byte encoding for the sequence: irmovq $15, %rbx; rrmovq %rbx, %rcx? (Starting at 0x100)
0x100: 30 f3 0f 00 00 00 00 00 00 00; 0x10a: 20 31
0x100: 30 3f 0f 00; 0x104: 20 13
0x100: 60 31; 0x102: 30 f3
0x100: 70 00; 0x102: 20 31
✅ Correct!
irmovq is 10 bytes (30 F rB ValC) and rrmovq is 2 bytes (20 rA rB).❌ Incorrect
Recall that Y86-64 instructions are fixed-length for specific types; irmovq always takes 10 bytes including the 8-byte constant.QUESTION 4
Determine the HCL code for the control signal mem_write in a SEQ processor.
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };
bool mem_write = icode in { IMRMOVQ, IPOPQ, IRET };
bool mem_write = (valE == valM);
bool mem_write = stat == AOK;
✅ Correct!
Only instructions that push to stack or store to memory (rmmovq, pushq, call) trigger a memory write.❌ Incorrect
IMRMOVQ and IPOPQ perform memory reads, not writes.QUESTION 5
In the PIPE implementation, when should the signal E_bubble be set?
On mispredicted branches or load-use hazards.
Every time the PC is updated.
Only when the processor hits a HALT instruction.
When the Register File is being read.
✅ Correct!
E_bubble clears the execute stage to handle branch mispredictions or to stall for a load-use hazard.❌ Incorrect
Bubbling is a specific hazard management technique, not a standard part of every cycle.Case Study: Architectural Optimization and Logic
Advanced Y86-64 Implementation Details
You are tasked with extending the Y86-64 design. Consider the introduction of the iaddq instruction and the performance limits of a pipelined system with $k$ stages and overhead $T_{overhead}$.
Q
1. [Writing Task] Rewrite the Y86-64 sum function of Figure 4.6 to make use of the iaddq instruction. (Output: ~14 lines).
Solution:
Model Solution: sum: irmovq $0, %rax # 1: sum = 0 andq %rsi, %rsi # 2: set CC jmp test # 3: start test loop: mrmovq (%rdi), %rdx # 4: get *start addq %rdx, %rax # 5: sum += *start iaddq $8, %rdi # 6: start++ (Optimization!) iaddq $-1, %rsi # 7: count-- (Optimization!) test: jg loop # 8: if count > 0, loop ret # 9: return (Note: This removes the need for registers %r8 and %r9 previously used to store constants 8 and 1.)
Model Solution: sum: irmovq $0, %rax # 1: sum = 0 andq %rsi, %rsi # 2: set CC jmp test # 3: start test loop: mrmovq (%rdi), %rdx # 4: get *start addq %rdx, %rax # 5: sum += *start iaddq $8, %rdi # 6: start++ (Optimization!) iaddq $-1, %rsi # 7: count-- (Optimization!) test: jg loop # 8: if count > 0, loop ret # 9: return (Note: This removes the need for registers %r8 and %r9 previously used to store constants 8 and 1.)
Q
2. Write HCL code for a circuit that selects the median of word inputs A, B, and C.
Solution:
word median = [ (A <= B && B <= C) || (C <= B && B <= A) : B; (B <= A && A <= C) || (C <= A && A <= B) : A; 1 : C; ];
word median = [ (A <= B && B <= C) || (C <= B && B <= A) : B; (B <= A && A <= C) || (C <= A && A <= B) : A; 1 : C; ];
Q
3. As the number of pipeline stages $k$ goes to infinity, what happens to the throughput?
Solution:
Throughput = 1 / (T/k + T_overhead). As k approaches infinity, the term T/k vanishes, and the throughput approaches a limit of 1 / T_overhead. This demonstrates that pipeline overhead eventually becomes the bottleneck for processor speed.
Throughput = 1 / (T/k + T_overhead). As k approaches infinity, the term T/k vanishes, and the throughput approaches a limit of 1 / T_overhead. This demonstrates that pipeline overhead eventually becomes the bottleneck for processor speed.